RNA velocity¶

Bergen et al., 2021 Beyond the scope of computational modeling, the statistical power of the methods depends on the curvature in the phase portrait since a lack of curvature challenges current models to distinguish whether an up- or down-regulation is occurring. The overall curvature of deviation from the steady-state line in the phase portrait is mostly impacted by the ratios of splicing to degradation rates (Box 1), indicating that statistical inference is limited to genes where splicing is faster or comparable to degradation, while a small ratio would yield straight lines rather than an interpretable curvature.

In [1]:
import scvelo as scv
import numpy as np
import pandas as pd
import scipy
import scanpy as sc

import os
In [2]:
sc.settings.vector_friendly = False
scv.set_figure_params(figsize=(2, 5))
In [3]:
os.chdir('/research/peer/fdeckert/FD20200109SPLENO')

Parmameter¶

In [4]:
adata_file = 'data/object/scvelo.h5ad'

Import data¶

In [5]:
adata = sc.read_h5ad('data/object/velocyto.h5ad')
obs = pd.read_csv('data/object/int/meta/meta.csv', index_col=0)
obsm = pd.read_csv('data/object/int/reductions/X_umap/reduction.csv', index_col=0)

Filter velocity matrix and combine with obs and obsm from previous analysis.¶

In [6]:
population_names = ['MPP (1)', 'Ery (1)', 'Ery (2)', 'Ery (3)', 'Ery (4)', 'Ery (5)', 'Ery (6)']
celltype_colours = [
    '#FA9FB5', 
    '#FC9272', 
    '#FB6A4A', 
    '#EF3B2C', 
    '#CB181D', 
    '#A50F15', 
    '#67000D'
]
In [7]:
# Filter obs by Ery annotation and treatment 
obs = obs[obs['leiden_annotation'].isin(population_names)]

# Filter obsm by cell index
obsm = obsm[obsm.index.isin(obs.index)]
In [8]:
# Filter velocity adata by obs 
adata = adata[adata.obs.index.isin(obs.index)]
In [9]:
# Order index to match velocity adata 
obs = obs.reindex(adata.obs.index)
obsm = obsm.reindex(adata.obs.index)

adata.obs = obs
adata.obsm['X_umap'] = obsm.to_numpy()
In [10]:
adata.uns['leiden_annotation_colors'] = celltype_colours

adata.obs['leiden_annotation'] = pd.Series(adata.obs['leiden_annotation'], dtype="category")
adata.obs['leiden_annotation'].cat.reorder_categories(population_names, inplace=True)
In [11]:
sc.pl.umap(adata, color=['leiden', 'tissue', 'treatment', 'leiden_annotation', 'sample_rep', 'cc_phase_class', 'pMt_RNA', 'pHb_RNA', 'pRb_RNA'], wspace=1, ncols=5)
... storing 'sample_name' as categorical
... storing 'sample_rep' as categorical
... storing 'tissue' as categorical
... storing 'treatment' as categorical
... storing 'sample_group' as categorical
... storing 'sample_path' as categorical
... storing 'sample_dir' as categorical
... storing 'cellranger_class' as categorical
... storing 'cc_phase_class' as categorical
... storing 'label_main_immgen' as categorical
... storing 'label_fine_immgen' as categorical
... storing 'label_main_haemosphere' as categorical
... storing 'label_fine_haemosphere' as categorical
... storing 'qc_class' as categorical
... storing 'doublet_cluster' as categorical
... storing 'doublet_class' as categorical
... storing 'doublet_class_int' as categorical
In [12]:
adata_temp = adata.copy()

Scvelo NaCl¶

In [13]:
adata = adata_temp.copy()
adata = adata[adata.obs['treatment']=="NaCl"]
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
Trying to set attribute `.obs` of view, copying.
Filtered out 26496 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
In [14]:
scv.pp.moments(adata)
computing neighbors
    finished (0:00:15) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:01) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
In [15]:
scv.tl.recover_dynamics(adata)
recovering dynamics (using 1/32 cores)
    finished (0:12:47) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
In [16]:
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
computing velocities
    finished (0:00:04) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/32 cores)
    finished (0:00:12) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing terminal states
WARNING: Uncertain or fuzzy root cell identification. Please verify.
    identified 6 regions of root cells and 1 region of end points .
    finished (0:00:01) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
computing latent time using root_cells as prior
    finished (0:00:02) --> added 
    'latent_time', shared time (adata.obs)
In [17]:
scv.set_figure_params(figsize=(2, 5))
scv.pl.velocity_embedding_stream(adata, basis='X_umap', color=['leiden_annotation'])
computing velocity embedding
    finished (0:00:02) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
In [18]:
sc.pl.umap(adata, color=['root_cells', 'end_points', 'velocity_pseudotime', 'latent_time'])
In [19]:
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).dropna().index
scv.tl.differential_kinetic_test(adata, var_names=top_genes, groupby='leiden_annotation')
testing for differential kinetics
    finished (0:05:35) --> added 
    'fit_diff_kinetics', clusters displaying differential kinetics (adata.var)
    'fit_pvals_kinetics', p-values of differential kinetics (adata.var)
In [20]:
adata.write_h5ad('scvelo_nacl.h5ad')
... storing 'fit_diff_kinetics' as categorical

MURK (1)¶

In [21]:
scv.tl.velocity(adata, diff_kinetics=True)
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
computing velocities
    finished (0:00:02) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/32 cores)
    finished (0:00:20) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing latent time using root_cells as prior
    finished (0:00:01) --> added 
    'latent_time', shared time (adata.obs)
In [22]:
scv.set_figure_params(figsize=(2, 5))
scv.pl.velocity_embedding_stream(adata, basis='X_umap', color=['leiden_annotation'])
computing velocity embedding
    finished (0:00:02) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)

Scvelo CpG¶

In [23]:
adata = adata_temp.copy()
adata = adata[adata.obs['treatment']=="CpG"]
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=2000)
Trying to set attribute `.obs` of view, copying.
Filtered out 26926 genes that are detected 20 counts (shared).
Normalized count data: X, spliced, unspliced.
Extracted 2000 highly variable genes.
Logarithmized X.
In [24]:
scv.pp.moments(adata, use_highly_variable=True)
computing neighbors
    finished (0:00:03) --> added 
    'distances' and 'connectivities', weighted adjacency matrices (adata.obsp)
computing moments based on connectivities
    finished (0:00:02) --> added 
    'Ms' and 'Mu', moments of un/spliced abundances (adata.layers)
In [25]:
scv.tl.recover_dynamics(adata)
recovering dynamics (using 1/32 cores)
    finished (0:16:35) --> added 
    'fit_pars', fitted parameters for splicing dynamics (adata.var)
In [26]:
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
computing velocities
    finished (0:00:07) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/32 cores)
    finished (0:00:24) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing terminal states
    identified 6 regions of root cells and 2 regions of end points .
    finished (0:00:04) --> added
    'root_cells', root cells of Markov diffusion process (adata.obs)
    'end_points', end points of Markov diffusion process (adata.obs)
computing latent time using root_cells as prior
    finished (0:00:03) --> added 
    'latent_time', shared time (adata.obs)
In [27]:
scv.set_figure_params(figsize=(2, 5))
scv.pl.velocity_embedding_stream(adata, basis='X_umap', color=['leiden_annotation'])
computing velocity embedding
    finished (0:00:04) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)
In [28]:
sc.pl.umap(adata, color=['root_cells', 'end_points', 'velocity_pseudotime', 'latent_time'])
In [29]:
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).dropna().index
scv.tl.differential_kinetic_test(adata, var_names=top_genes, groupby='leiden_annotation')
testing for differential kinetics
    finished (0:07:28) --> added 
    'fit_diff_kinetics', clusters displaying differential kinetics (adata.var)
    'fit_pvals_kinetics', p-values of differential kinetics (adata.var)
In [30]:
adata.write_h5ad('scvelo_cpg.h5ad')
... storing 'fit_diff_kinetics' as categorical

MURK (1)¶

In [31]:
scv.tl.velocity(adata, diff_kinetics=True)
scv.tl.velocity_graph(adata)
scv.tl.latent_time(adata)
computing velocities
    finished (0:00:03) --> added 
    'velocity', velocity vectors for each individual cell (adata.layers)
computing velocity graph (using 1/32 cores)
    finished (0:00:38) --> added 
    'velocity_graph', sparse matrix with cosine correlations (adata.uns)
computing latent time using root_cells as prior
    finished (0:00:03) --> added 
    'latent_time', shared time (adata.obs)
In [32]:
scv.set_figure_params(figsize=(2, 5))
scv.pl.velocity_embedding_stream(adata, basis='X_umap', color=['leiden_annotation'])
computing velocity embedding
    finished (0:00:04) --> added
    'velocity_umap', embedded velocity vectors (adata.obsm)